Varying Slopes Models

Hierarchical Models
Regression
Extending mixed-effects models by allowing the relationship (slope) to vary across different groups.

General Principles

To model the relationship between predictor variables and a dependent variable while allowing for varying effects across groups or clusters, we use a varying slopes model.

This approach is useful when we expect the relationship between predictors and the dependent variable to differ across groups (e.g., different slopes for different subjects, locations, or time periods). This allows every unit in the data to have its own unique response to any treatment, exposure, or event, while also improving estimates via pooling.

Considerations

Note
  • We have the same considerations as for 12. Varying intercepts.

  • The idea is pretty similar to categorical models, where a slope is specified for each category. However, here, we also estimate relationships between different groups. This leads to a different mathematical approach, as to model these relationships between groups, we model a matrix of covariance πŸ›ˆ.

  • To construct the covariance matrix, we use an SRS decomposition where S is a diagonal matrix of standard deviations and R is a correlation matrix. To model the correlation matrix, we use an LKJcorr distribution parametrized with a single control parameter Ξ· that controls the amount of regularization. Ξ· is usually set to 2 to define a weakly informative prior that is skeptical of extreme correlations near βˆ’1 or 1. When we use LKJcorr(1), the prior is flat over all valid correlation matrices. When the value is greater than 1, then extreme correlations are less likely.

  • The standard deviations in S are model with a prior that constrains them to strictly positive values.

Example

Below is an example code snippet demonstrating Bayesian regression with varying effects. This example is based on McElreath (2018).

Simulated data

from BI import bi, jnp
# Setup device------------------------------------------------
m = bi(platform='cpu')

# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.sim_multivariate_normal(only_path=True)
m.data(data_path, sep=',') 
m.data_on_model = dict(
    cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
    wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
    N_cafes = len(m.df.cafe.unique()),
    afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)

# Define model ------------------------------------------------
def model(cafe, wait, N_cafes, afternoon):
    a = m.dist.normal(5, 2,  name = 'a')
    b = m.dist.normal(-1, 0.5, name = 'b')
    sigma_cafe = m.dist.exponential(1, shape=(2,),  name = 'sigma_cafe')
    sigma = m.dist.exponential( 1,  name = 'sigma')
    Rho = m.dist.lkj(2, 2, name = 'Rho')
    cov = jnp.outer(sigma_cafe, sigma_cafe) * Rho
    a_cafe_b_cafe = m.dist.multivariate_normal(jnp.stack([a, b]), cov, shape = [N_cafes], name = 'a_b_cafe')    

    a_cafe, b_cafe = a_cafe_b_cafe[:, 0], a_cafe_b_cafe[:, 1]
    mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
    m.dist.normal(mu, sigma, obs=wait)

# Run sampler ------------------------------------------------
m.fit(model) 

# Summary ------------------------------------------------
m.summary()
from BI import bi

# Setup device------------------------------------------------
m = bi(platform='cpu')

# Import Data & Data Manipulation ------------------------------------------------
# Import
from importlib.resources import files
data_path = m.load.sim_multivariate_normal(only_path=True)
m.data(data_path, sep=',') 
m.data_on_model = dict(
    cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
    wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
    N_cafes = len(m.df.cafe.unique()),
    afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)


# Define model ------------------------------------------------
def model(cafe, wait, N_cafes, afternoon):

    sigma = m.dist.exponential( 1,  name = 'sigma')

    varying_intercept, varying_slope = m.effects.varying_effects(
        N_vars = 1,
        N_group = N_cafes,
        group_id = cafe,
        group_name = 'cafe',
        centered = False)
    

    mu = varying_intercept + varying_slope* afternoon
    m.dist.normal(mu, sigma, obs=wait)

# Run sampler ------------------------------------------------
m.fit(model) 
library(BayesianInference)
jnp = reticulate::import('jax.numpy')

# Setup platform------------------------------------------------
m=importBI(platform='cpu')

# Import data ------------------------------------------------
m$data(paste(system.file(package = "BayesianInference"),"/data/Sim data multivariatenormal.csv", sep = ''), sep=',')
m$data_to_model(list('cafe', 'wait', 'afternoon'))

# Import data ------------------------------------------------
m$data(paste(system.file(package = "BayesianInference"),"/data/Sim data multivariatenormal.csv", sep = ''), sep=',')
m$data_to_model(list('cafe', 'wait', 'afternoon'))
m$data_on_model

# Define model ------------------------------------------------
model <- function(cafe, afternoon, wait, N_cafes = as.integer(20) ){
  a = bi.dist.normal(5, 2, name = 'a')
  b = bi.dist.normal(-1, 0.5, name = 'b')
  sigma_cafe = bi.dist.exponential(1, shape= c(2), name = 'sigma_cafe')
  sigma = bi.dist.exponential( 1, name = 'sigma')
  Rho = bi.dist.lkj(as.integer(2), as.integer(2), name = 'Rho')
  cov = jnp$outer(sigma_cafe, sigma_cafe) * Rho
  
  a_cafe_b_cafe = bi.dist.multivariate_normal(
    jnp$squeeze(jnp$stack(list(a, b))), 
    cov, 
    shape = c(N_cafes), name = 'a_cafe')  
  
  a_cafe = a_cafe_b_cafe[, 0]
  b_cafe = a_cafe_b_cafe[, 1]
  
  mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
  
  bi.dist.normal(mu, sigma, obs=wait)
}

# Run MCMC ------------------------------------------------
m$fit(model) # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m$summary() # Get posterior distribution
using BayesianInference

# Setup device------------------------------------------------
m = importBI(platform="cpu")

# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.sim_multivariate_normal(only_path = true)

m.data(data_path, sep=',') 
m.data_on_model = pybuiltins.dict(
    cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
    wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
    N_cafes = length(m.df.cafe.unique()),    
    afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)


# Define model ------------------------------------------------
@BI function model(cafe, wait, N_cafes, afternoon)
    a = m.dist.normal(5, 2,  name = "a")
    b = m.dist.normal(-1, 0.5, name = "b")
    sigma_cafe = m.dist.exponential(1, shape=(2,),  name = "sigma_cafe")
    sigma = m.dist.exponential( 1,  name = "sigma")
    Rho = m.dist.lkj(2, 2, name = "Rho")
    cov = jnp.outer(sigma_cafe, sigma_cafe) * Rho
    a_cafe_b_cafe = m.dist.multivariate_normal(jnp.stack([a, b]), cov, shape = [N_cafes], name = "a_b_cafe")    

    a_cafe, b_cafe = a_cafe_b_cafe[:, 0], a_cafe_b_cafe[:, 1]
    mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
    m.dist.normal(mu, sigma, obs=wait)

end

# Run mcmc ------------------------------------------------
m.fit(model)  # Optimize model parameters through MCMC sampling

# Summary ------------------------------------------------
m.summary() # Get posterior distributions

Mathematical Details

Centered parameterization

The Gaussian Mixture Model is a hierarchical model where each data point is generated from one of K distinct multivariate Gaussian distributions.

The varying intercepts (\alpha_k) and slopes (\beta_k) are modeled using a Multivariate Normal distribution:

\begin{pmatrix} \alpha_k \\ \beta_k \end{pmatrix} \sim \text{MultivariateNormal}\left( \begin{pmatrix} \bar{\alpha} \\ \bar{\beta} \end{pmatrix}, \text{diag}(\varsigma) ~ \Omega ~ \text{diag}(\varsigma) \right)

\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1) \varsigma \sim \text{Exponential}(1) \Omega \sim \text{LKJ}(\eta)

Where:

  • \left(\begin{array}{cc} \bar{\alpha} \\ \bar{\beta} \end{array}\right) is a vector composed from concatenating a parameter for the global intercept and a parameter vector of the global slopes.

  • \varsigma is a vector giving the standard deviation of the random effects for the intercept and slopes across groups.

  • \Omega is the correlation matrix.

Non-centered parameterization

For computational reasons, it is often better to implement a non-centered parameterization πŸ›ˆ that is equivalent to the Multivariate Normal distribution approach:

\left(\begin{array}{cc} \alpha_k \\ \beta_k\end{array}\right) = \left(\begin{array}{cc} \bar{\alpha} \\ \bar{\beta} \end{array}\right) + \varsigma\circ \left( L \cdot \left( \begin{array}{cc} \widehat{\alpha}_k \\ \widehat{\beta}_k \end{array} \right) \right)

\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1) \varsigma \sim \text{Exponential}(1) L \sim \text{LKJ Cholesky}(\eta)

\widehat{\alpha}_k \sim \text{Exponential}(1)

\widehat{\beta}_k \sim \text{Exponential}(1)

  • Where:

    • \sigma_\alpha \sim \text{Exponential}(1) is the prior standard deviation among intercepts.

    • \sigma_\beta \sim \text{Exponential}(1) is the prior standard deviation among slopes.

    • L \sim \text{LKJcorr}(\eta) is the a cholesky factor of the correlation matrix matrix using the Cholesky Factor πŸ›ˆ

Multivariate Model with One Random Slope for Each Variable

We can apply a multivariate model similarly to Chapter 2. In this case, we apply the same principle, but with a covariance matrix with a dimension equal to the number of varying slopes we define. For example, if we want to generate random slopes for i observations in a model with two independent variables X_1 and X_2, we can define the formula as follows:

Y_{i} \sim \text{Normal}(\mu_i , \sigma)

\mu_i = \alpha_i + \beta_{k(i)} X_{1i} + \gamma_{k(i)} X_{2i}


\begin{pmatrix} \alpha\\ \beta\\ \gamma \end{pmatrix} \sim \begin{pmatrix} \bar{\alpha}\\ \bar{\beta}\\ \bar{\gamma} \end{pmatrix} + \varsigma \circ \left( L \cdot \begin{pmatrix} \widehat{\alpha}_{k} \\ \widehat{\beta}_{k} \\ \widehat{\gamma}_{k} \end{pmatrix} \right)

\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1)

\bar{\gamma} \sim \text{Normal}(0, 1)

\varsigma \sim \text{Exponential}(1) L \sim \text{LKJ Cholesky}(2)

\widehat{\alpha}_k \sim \text{Exponential}(1)

\widehat{\beta}_k \sim \text{Exponential}(1)

\widehat{\gamma}_k \sim \text{Exponential}(1)

Notes

Note
  • We can apply interaction terms similarly to Chapter 3.

  • We can apply categorical variables similarly to Chapter 4.

  • We can apply multiple random effects in Bayesian inference as follows:

def model(cafe, wait, N_cafes, afternoon, state):
    sigma = m.dist.exponential( 1,  name = 'sigma')
    
    # ⚠️ Note: `alpha_bar` and `beta_bar` are set to zero to ensure the same intercepts for all varying effects. ⚠️
    varying_intercept_1, varying_slope_1 = m.effects.varying_effects(
        N_vars = 1,
        N_group = N_cafes,
        group_id = cafe, 
        group_name='cafe',
        alpha_bar = jnp.array([0]),
        beta_bar = jnp.array([0]),
        centered=True,
    )

    varying_intercept_2, varying_slope_2 = m.effects.varying_effects(
        N_vars = 1,
        N_group = 4,
        group_id = states, 
        group_name='states',
        alpha_bar = jnp.array([0]),
        beta_bar = jnp.array([0]),
        centered=True,
    )

    varying_intercept = varying_intercept_1 + varying_intercept_2
    varying_slope = varying_slope_1 + varying_slope_2

    mu = varying_intercept + varying_slope[:,0] * afternoon 
    m.dist.normal(mu, sigma, obs=wait

Where state is a second varying effect, representing cafes clustered by state.

⚠️ Note: alpha_bar and beta_bar are set to zero to ensure the same intercepts for all varying effects. ⚠️

Reference(s)

McElreath, Richard. 2018. Statistical Rethinking: A Bayesian course with examples in R and Stan. Chapman; Hall/CRC.